[Feature] A2C objective class and train example #680

albertbou92 · 2022-11-15T15:31:20Z

Description

Added an A2C objective class.

I also created the helper functions necessary to run an A2C example, including make_a2c_loss, A2CLossConfig, make_a2c_model, A2CModelConfig

Creating a make_a2c_model helper function was not strictly necessary since the models are the same as in PPO. However, I wanted to use less nodes in the hidden layers so I decided to create a make_a2c_model instead of modifying the make_ppo_model. The methods can probably be merged in the future if necessary, and the architecture of the networks can be passed as a parameter.

Finally, I played a bit with the parameters int he canfig.yaml file until I found a good enough configuration that learned pretty well in the HalfCheetah-v4 environment.

Motivation and Context

There is an open issue about A2C, and while it is similar to REINFORCE and PPO which are already in the repo, the objective is not the same. In particular, it has the entropy term (which is not present in REINFORCE) and it does not have the log prob ratio weighting term, the clipping and the KL term present in PPO.

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

vmoens

Would you mind merging main and trying to solve the issues with the new "next" logic? Let me also know what you think of it :)

* init * strict=False * amend * amend

* Add auto-compute stats feature for ObservationNorm * Fix issue in ObservNorm init function * Quick refactor of ObservationNorm init method * Minor refactoring and adding more tests for ObservationNorm * lint * docstring * docstring Co-authored-by: vmoens <vincentmoens@gmail.com>

* init * [Feature] Nested composite spec (pytorch#654) * [Feature] Move `transform.forward` to `transform.step` (pytorch#660) * transform step function * amend * amend * amend * amend * amend * fixing key names * fixing key names * [Refactor] Transform next remove (pytorch#661) * Refactor "next_" into ("next", ) (pytorch#673) * amend * amend * bugfix * init * strict=False * strict=False * minor * amend * [BugFix] Use GitHub for flake8 pre-commit hook (pytorch#679) * amend * [BugFix] Update to strict select (pytorch#675) * init * strict=False * amend * amend * [Feature] Auto-compute stats for ObservationNorm (pytorch#669) * Add auto-compute stats feature for ObservationNorm * Fix issue in ObservNorm init function * Quick refactor of ObservationNorm init method * Minor refactoring and adding more tests for ObservationNorm * lint * docstring * docstring Co-authored-by: vmoens <vincentmoens@gmail.com> * amend * amend * lint * bf * bf * amend Co-authored-by: Romain Julien <romainjulien@fb.com> Co-authored-by: Romain Julien <romainjulien@fb.com>

albertbou92 · 2022-11-17T18:33:05Z

Done! I brought all the changes from main, and now the training script calculates the initial Stats with the key "observation_vector" instead of "next_observation_vector". It should be the same since it is actually the same tensor delayed by 1 timestep. I also checked that the example script runs without issues.

vmoens · 2022-11-18T11:40:42Z

It feels like you merged each diff independently, which makes a gigantic diff here (over 50 files changed)
Can you run git merge main and see what happens?

* init * [Feature] Nested composite spec (pytorch#654) * [Feature] Move `transform.forward` to `transform.step` (pytorch#660) * transform step function * amend * amend * amend * amend * amend * fixing key names * fixing key names * [Refactor] Transform next remove (pytorch#661) * Refactor "next_" into ("next", ) (pytorch#673) * amend * amend * bugfix * init * strict=False * strict=False * minor * amend * [BugFix] Use GitHub for flake8 pre-commit hook (pytorch#679) * amend * [BugFix] Update to strict select (pytorch#675) * init * strict=False * amend * amend * [Feature] Auto-compute stats for ObservationNorm (pytorch#669) * Add auto-compute stats feature for ObservationNorm * Fix issue in ObservNorm init function * Quick refactor of ObservationNorm init method * Minor refactoring and adding more tests for ObservationNorm * lint * docstring * docstring Co-authored-by: vmoens <vincentmoens@gmail.com> * amend * amend * lint * bf * bf * amend Co-authored-by: Romain Julien <romainjulien@fb.com> Co-authored-by: Romain Julien <romainjulien@fb.com>

* amend * amend * amend * amend * amend * amend

vmoens

LGTM overall. The lint test is failing, a pre-commit should solve that. We should consider adding this to the example test pipeline (#687). After that I think we'll be good to go!

torchrl/objectives/a2c.py

codecov · 2022-11-21T18:40:06Z

Codecov Report

Merging #680 (60c2730) into main (170c6f3) will increase coverage by 0.09%.
The diff coverage is 93.00%.

@@            Coverage Diff             @@
##             main     #680      +/-   ##
==========================================
+ Coverage   87.78%   87.88%   +0.09%     
==========================================
  Files         119      120       +1     
  Lines       20201    20590     +389     
==========================================
+ Hits        17733    18095     +362     
- Misses       2468     2495      +27

Flag	Coverage Δ
habitat-gpu	`24.20% <22.53%> (?)`
linux-cpu	`85.58% <88.00%> (+0.04%)`	⬆️
linux-gpu	`86.28% <88.00%> (+0.03%)`	⬆️
linux-outdeps-gpu	`72.46% <55.75%> (-0.36%)`	⬇️
linux-stable-cpu	`85.44% <88.00%> (+0.04%)`	⬆️
linux-stable-gpu	`86.14% <88.00%> (+0.03%)`	⬆️
macos-cpu	`85.31% <88.00%> (+0.05%)`	⬆️
olddeps-gpu	`75.33% <88.00%> (+0.24%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
torchrl/trainers/helpers/__init__.py	`100.00% <ø> (ø)`
torchrl/trainers/helpers/models.py	`91.30% <82.10%> (-1.98%)`	⬇️
test/test_helpers.py	`91.59% <91.78%> (+0.03%)`	⬆️
test/test_cost.py	`96.95% <96.47%> (-0.06%)`	⬇️
torchrl/objectives/__init__.py	`100.00% <100.00%> (ø)`
torchrl/objectives/a2c.py	`100.00% <100.00%> (ø)`
torchrl/trainers/helpers/losses.py	`39.83% <100.00%> (+9.43%)`	⬆️
torchrl/modules/models/exploration.py	`91.79% <0.00%> (+0.51%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

vmoens

Just a couple of minor changes and we're good to go!

examples/a2c/config.yaml

torchrl/trainers/helpers/models.py

albertbou92 added 8 commits November 14, 2022 17:56

a2c

7cc7c61

a2c

5eac813

a2c config

880fed3

a2c config

4c2436c

fix imports

2b77432

latest

ea4c3a2

simplified config

5f4d290

simplified config

e08dd5d

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 15, 2022

albertbou92 changed the title ~~A2C objective class and train example~~ [Feature] A2C objective class and train example Nov 15, 2022

Update config.yaml

82db357

vmoens reviewed Nov 16, 2022

View reviewed changes

vmoens and others added 9 commits November 16, 2022 20:13

[BugFix] Use GitHub for flake8 pre-commit hook (pytorch#679)

afca8f4

[BugFix] Update to strict select (pytorch#675)

ea83339

* init * strict=False * amend * amend

[Doc] _make_collector helper function (pytorch#678)

8765ac9

[Doc] BatchSubSampler class docstrings example (pytorch#677)

14fbac9

[BugFix] PPO objective crashes if advantage_module is None (pytorch#676)

bcdb0bc

Minor: lint

fbb0e9f

adapted to nested next td

1479497

vmoens and others added 5 commits November 18, 2022 12:52

[Doc] More doc about environments (pytorch#683)

0d28c79

* amend * amend * amend * amend * amend * amend

[Doc] Fix missing tensordict install for doc (pytorch#685)

7c36de6

Merge branch 'main' into a2c

1d8dc7b

model config fix

509276f

vmoens added the new algo New algorithm request or PR label Nov 18, 2022

vmoens reviewed Nov 21, 2022

View reviewed changes

torchrl/objectives/a2c.py Outdated Show resolved Hide resolved

albertbou92 added 6 commits November 21, 2022 16:37

formatting

0a0fca8

formatting

ac5b857

a2c runtime error comment change

34b411f

a2c test

5068059

a2c test

f04b0f9

a2c test

f4b2289

albertbou92 added 4 commits November 22, 2022 10:00

make a2c model test

e39d7ff

increase a2c tests coverage

859ccd9

formatting

1eaa4c5

fix bug a2c testing

7f78bf6

vmoens approved these changes Nov 23, 2022

View reviewed changes

examples/a2c/config.yaml Outdated Show resolved Hide resolved

torchrl/trainers/helpers/models.py Outdated Show resolved Hide resolved

minor fixes

60c2730

vmoens merged commit 9a81a97 into pytorch:main Nov 23, 2022

albertbou92 deleted the a2c branch November 30, 2022 14:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] A2C objective class and train example #680

[Feature] A2C objective class and train example #680

Uh oh!

albertbou92 commented Nov 15, 2022 •

edited

Loading

Uh oh!

vmoens left a comment

Uh oh!

albertbou92 commented Nov 17, 2022 •

edited

Loading

Uh oh!

vmoens commented Nov 18, 2022

Uh oh!

vmoens left a comment

Uh oh!

Uh oh!

codecov bot commented Nov 21, 2022 •

edited

Loading

Uh oh!

vmoens left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Feature] A2C objective class and train example #680

[Feature] A2C objective class and train example #680

Uh oh!

Conversation

albertbou92 commented Nov 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Types of changes

Checklist

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

albertbou92 commented Nov 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vmoens commented Nov 18, 2022

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Nov 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

albertbou92 commented Nov 15, 2022 •

edited

Loading

albertbou92 commented Nov 17, 2022 •

edited

Loading

codecov bot commented Nov 21, 2022 •

edited

Loading